In order to solve the problem of insufficient mining of potential association between remote nodes in human action recognition tasks, and the problem of high training cost caused by using multi-modal data, a multi-scale feature fusion human action recognition method under the condition of single mode was proposed. Firstly, the global feature correlation of the original skeleton diagram of human body was carried out, and the coarse-scale global features were used to capture the connections between the remote nodes. Secondly, the global feature correlation graph was divided locally to obtain the Complementary Subgraphs with Global Features (CSGFs), the fine-scale features were used to establish the strong correlation, and the multi-scale feature complementarity was formed. Finally, the CSGFs were input into the spatial-temporal Graph Convolutional module for feature extraction, and the extracted results were aggregated to output the final classification results. Experimental results show that the accuracy of the proposed method on the authoritative action recognition dataset NTU RGB+D60 is 89.0% (X-sub) and 94.2% (X-view) respectively. On the challenging large-scale dataset NTU RGB+D120, the accuracy of the proposed method is 83.3% (X-sub) and 85.0% (X-setup) respectively, which is 1.4 and 0.9 percentage points higher than that of the ST-TR (Spatial-Temporal TRansformer) under single modal respectively, and 4.1 and 3.5 percentage points higher than that of the lightweight SGN (Semantics-Guided Network). It can be seen that the proposed method can fully exploit the synergistic complementarity of multi-scale features, and effectively improve the recognition accuracy and training efficiency of the model under the condition of single modal.